iteration number
002262941c9edfd472a79298b2ac5e17-Supplemental-Conference.pdf
A.1 Proof Sketch We first introduce the following lemma: Lemma 1. Lemma 2. For matrices A,B 2Mn, if A B, then we have min(A) min(B)and max(A) max(B), where max() (resp., min()) denotes taking the maximum (resp., minimum) eigenvalue.. Proof of Lemma 2. For any matrix P 2Mn with P> = P, we have max(P) = max We first consider the condition number of หH when X is in a locally convex area. By equations 3 and 4, we have M1 H M2. Rearranging the terms yields H M1 0 and M2 H 0. Therefore, for any vector x 2RM, we have We next consider the minimum singular value of H and หH with min(H)= p min(H2) and min(หH)= q min(หH2) in any case. Under Assumption 1 and equation 4, we have H M2. Similarly, we can obtain H M2. By Lemma 2, we further have max(H) max(M2)= nmax 2 C.1 kr หf(หX) k2 vs. krf(X) k2 In this section, we explain why we use kr หf(หX) k2 rather than kr f(X) k2 to characterize the convergence rate. In general, it is hard to develop a convergence rate for objective values. However, when the global model is in a locally convex area of f, we can obtain the relationship between the gradient and the local optimum.
Node Dependent Local Smoothing for Scalable Graph Learning
Recent works reveal that feature or label smoothing lies at the core of Graph Neural Networks (GNNs). Concretely, they show feature smoothing combined with simple linear regression achieves comparable performance with the carefully designed GNNs, and a simple MLP model with label smoothing of its prediction can outperform the vanilla GCN. Though an interesting finding, smoothing has not been well understood, especially regarding how to control the extent of smoothness. Intuitively, too small or too large smoothing iterations may cause under-smoothing or over-smoothing and can lead to sub-optimal performance. Moreover, the extent of smoothness is node-specific, depending on its degree and local structure. To this end, we propose a novel algorithm called node-dependent local smoothing (NDLS), which aims to control the smoothness of every node by setting a node-specific smoothing iteration. Specifically, NDLS computes influence scores based on the adjacency matrix and selects the iteration number by setting a threshold on the scores. Once selected, the iteration number can be applied to both feature smoothing and label smoothing. Experimental results demonstrate that NDLS enjoys high accuracy -- state-of-the-art performance on node classifications tasks, flexibility -- can be incorporated with any models, scalability and efficiency -- can support large scale graphs with fast training.
Distributional Consistency Loss: Beyond Pointwise Data Terms in Inverse Problems
Webber, George, Reader, Andrew J.
Recovering true signals from noisy measurements is a central challenge in inverse problems spanning medical imaging, geophysics, and signal processing. Current solutions balance prior assumptions regarding the true signal (regularization) with agreement to noisy measured data (data-fidelity). Conventional data-fidelity loss functions, such as mean-squared error (MSE) or negative log-likelihood, seek pointwise agreement with noisy measurements, often leading to overfitting to noise. In this work, we instead evaluate data-fidelity collectively by testing whether the observed measurements are statistically consistent with the noise distributions implied by the current estimate. We adopt this aggregated perspective and introduce distributional consistency (DC) loss, a data-fidelity objective that replaces pointwise matching with distribution-level calibration using model-based probability scores for each measurement. DC loss acts as a direct and practical plug-in replacement for standard data consistency terms: i) it is compatible with modern regularizers, ii) it is optimized in the same way as traditional losses, and iii) it avoids overfitting to measurement noise even without the use of priors. Its scope naturally fits many practical inverse problems where the measurement-noise distribution is known and where the measured dataset consists of many independent noisy values. We demonstrate efficacy in two key example application areas: i) in image denoising with deep image prior, using DC instead of MSE loss removes the need for early stopping and achieves higher PSNR; ii) in medical image reconstruction from Poisson-noisy data, DC loss reduces artifacts in highly-iterated reconstructions and enhances the efficacy of hand-crafted regularization. These results position DC loss as a statistically grounded, performance-enhancing alternative to conventional fidelity losses for inverse problems.